The availability of challenging benchmarks has played a key role in the recent progress of machine learning. In cooperative multi-agent reinforcement learning, the StarCraft Multi-Agent Challenge (SMAC) has become a popular testbed for centralised training with decentralised execution. However, after years of sustained improvement on SMAC, algorithms now achieve near-perfect performance. In this work, we conduct new analysis demonstrating that SMAC is not sufficiently stochastic to require complex closed-loop policies. In particular, we show that an open-loop policy conditioned only on the timestep can achieve non-trivial win rates for many SMAC scenarios. To address this limitation, we introduce SMACv2, a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings (from the same distribution) during evaluation. We show that these changes ensure the benchmark requires the use of closed-loop policies. We evaluate state-of-the-art algorithms on SMACv2 and show that it presents significant challenges not present in the original benchmark. Our analysis illustrates that SMACv2 addresses the discovered deficiencies of SMAC and can help benchmark the next generation of MARL methods. Videos of training are available at https://sites.google.com/view/smacv2
translated by 谷歌翻译
实践和磨练技能构成了人类学习方式的基本组成部分,但很少专门培训人造代理人来执行它们。取而代之的是,它们通常是端到端训练的,希望有用的技能将被隐含地学习,以最大程度地提高某些外部奖励功能的折扣回报。在本文中,我们研究了如何将技能纳入具有较大州行动空间和稀疏奖励的复杂环境中的加固学习训练中。为此,我们创建了Skillhack,这是Nethack游戏的任务和相关技能的基准。我们评估了该基准测试的许多基准,以及我们自己的新型基于技能的方法层次启动(HKS),该方法的表现优于所有其他评估的方法。我们的实验表明,先验了解有用技能的学习可以显着改善代理在复杂问题上的表现。我们最终认为,利用预定义的技能为RL问题提供了有用的归纳偏见,尤其是那些具有较大国家行动空间和稀疏奖励的问题。
translated by 谷歌翻译
加强学习(RL)研究的进展通常是由新的,具有挑战性的环境的设计驱动的,这是一项昂贵的事业,需要技能与典型的机器学习研究人员的正交性。环境发展的复杂性仅随着程序性产生(PCG)的兴起而增加,作为产生能够测试RL剂稳健性和泛化的各种环境的流行范式。此外,现有环境通常需要复杂的构建过程,从而使重现结果变得困难。为了解决这些问题,我们介绍了基于网状引擎的基于网络的集成开发环境(IDE)Griddlyjs。 Griddlyjs允许研究人员使用方便的图形接口在视觉上设计和调试任意,复杂的PCG网格世界环境,并可视化,评估和记录训练有素的代理模型的性能。通过将RL工作流连接到由现代Web标准启用的高级功能,Griddlyjs允许发布交互式代理 - 环境演示,将实验结果直接重现为Web。为了证明Griddlyjs的多功能性,我们使用它来快速开发一个复杂的组成拼图解决环境,以及任意人为设计的环境配置及其用于自动课程学习和离线RL的解决方案。 Griddlyjs IDE是开源的,可以在\ url {https://griddly.ai}上免费获得。
translated by 谷歌翻译
深度强化学习(RL)的进展是通过用于培训代理商的具有挑战性的基准的可用性来驱动。但是,社区广泛采用的基准未明确设计用于评估RL方法的特定功能。虽然存在用于评估RL的特定打开问题的环境(例如探索,转移学习,无监督环境设计,甚至语言辅助RL),但一旦研究超出证明,通常难以将这些更富有,更复杂的环境 - 概念结果。我们展示了一个强大的沙箱框架,用于易于设计新颖的RL环境。 Minihack是一个停止商店,用于RL实验,环境包括从小房间到复杂的,程序生成的世界。通过利用来自Nethack的全套实体和环境动态,MiniHack是最富有的基网上的视频游戏之一,允许设计快速方便的定制RL测试台。使用这种沙箱框架,可以轻松设计新颖的环境,可以使用人类可读的描述语言或简单的Python接口来设计。除了各种RL任务和基线外,Minihack还可以包装现有的RL基准,并提供无缝添加额外复杂性的方法。
translated by 谷歌翻译
In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint actionvalues conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.
translated by 谷歌翻译
传统的数据湖泊通过启用时间旅行,运行SQL查询,使用酸性交易摄入数据以及可视化PBABYTE尺度数据集在云存储中,为分析工作负载提供了关键的数据基础架构。它们使组织能够分解数据孤岛,解锁数据驱动的决策,提高运营效率并降低成本。但是,随着深度学习接管常见的分析工作流程,传统数据湖泊对诸如自然语言处理(NLP),音频处理,计算机视觉和涉及非尾巴数据集的应用程序的有用程度降低。本文介绍了Deep Lake,这是一个开源湖泊,用于在Activeloop开发的深度学习应用程序。 Deep Lake保持了一项关键区别的香草数据湖的好处:它以张量的形式存储复杂数据,例如图像,视频,注释以及表格数据,并将数据迅速流式传输到网络上(a )张量查询语言,(b)浏览器可视化引擎或(c)不牺牲GPU利用率的深度学习框架。可以从Pytorch,Tensorflow,Jax,与许多MLOPS工具集成在一起的数据集。
translated by 谷歌翻译